Improving Software Pipelining by Hiding Memory Latency with Combined Loads and Prefetches
نویسندگان
چکیده
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit software prefetching instructions. Unfortunately, each mechanism has potential drawbacks. Non-blocking loads can significantly increase register pressure by extending the lifetimes of loads. Software prefetching increases the number of memory instructions in the loop body. For a loop whose execution time is bound by the number of loads/stores that can be issued per cycle, software prefetching exacerbates this problem and increases the number of idle computational cycles in loops. In this paper, we show how compiler and architecture support for combining a load and a prefetch into one instruction, called a prefetching load, can give lower register pressure like software prefetching and lower load/storeunit requirements like non-blocking loads. On a set of 106 Fortran loops we show that prefetching loads obtain a speedup of 1.07–1.53 over using just non-blocking loads and a speedup of 1.04–1.08 over using software prefetching. In addition, prefetching loads reduced floating-point register pressure by as much as a factor of 0.4 and integer register pressure by as much as a factor of 0.8 over non-blocking loads. Integer register pressure was also reduced by a factor of 0.97 over software prefetching, while floating-point register pressure was increased by a factor of 1.02 versus software prefetching in the worst case.
منابع مشابه
Improving Software Pipelining by Hiding Memory Latency
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit software prefetching instructions. Unfortunately , each mechanism has potential drawbacks. Non-blocking loads can signiicantly increase register pressure by extending the lifetimes of loads. Software prefetching increases the number of memory instructions in the loop body. For a loop whose executio...
متن کاملA Software Pipelining Method Based on a Hierarchical Social Algorithm
Software pipelining is a compile-time scheduling technique that overlaps successive loop iterations to achieve instruction-level parallelism. It allows us to hide memory latency by overlapping the prefetches for a future iteration with the computation of the current iteration. This paper presents an efficient algorithm for determining the iteration bound of cyclic data flow graphs and the optim...
متن کاملEffects of Main Memory Latencies on the Performance of Nonblocking Caches
Lockup-free caches in conjunction to non-blocking processor loads have been proposed to hide miss latencies in high performance processors. One problem with current approaches is the increased complexity of the processor and of the cache controller due to non-blocking. In this paper, we introduce a simple mechanism to support non-blocking loads and a lockup-free cache. A modified SPARC architec...
متن کاملA comparative evaluation of software techniques to hide memory latency
Software oriented techniques to hide memory latency in superscalar and superpipe2ined machines include loop unrolling, software pipelining, and software cache prefetching. Issuing the data fetch request prior to actual need for data allows overlap of accessing with useful computations. Loop unrolling and software pipelining do not necessitate microarchitecture or instruction set architecture ch...
متن کاملA Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches
Aggressive hardware-based and software-based prefetch algorithms for hiding memory access latencies were proposed to bridge the gap of the expanding speed disparity between processors and memory subsystems. As smaller L1 caches prevail in deep submicron processor designs in order to maintain short cache access cycles, cache pollution caused by ineffective prefetches is becoming a major challeng...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011